Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe
Identifieur interne : 001430 ( Main/Exploration ); précédent : 001429; suivant : 001431Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe
Auteurs : Toufik Sari [Algérie] ; Mokhtar Sellami [Algérie]Source :
- Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées [ 1638-5713 ] ; 2004.
Descripteurs français
- mix :
English descriptors
- mix :
Abstract
In this paper, we present two methods for correcting Arabic words generated by text and/or speech recognizers. These techniques operate as post-processors and they are conceived to be adaptable. They correct rejection and substitution word errors. The former one is very linked to the dictionary and is called 'lexicon driven', when the orther is very general exploiting contextual information and called 'context driven'. Arabic language properties are very useful in morpho-lexical analysis and so they were strongly exploited in the development of the second method. Substitution errors are rewritten in rules for being used by a rule based system. The extensions to the other levels of language analysis are considered in perspectives.
Url:
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 000144
- to stream Hal, to step Curation: 000144
- to stream Hal, to step Checkpoint: 000141
- to stream Main, to step Merge: 001733
- to stream Main, to step Curation: 001430
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr">Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe</title>
<author><name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID"><orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc><address><country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID"><orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc><address><country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01261705</idno>
<idno type="halId">hal-01261705</idno>
<idno type="halUri">https://hal.inria.fr/hal-01261705</idno>
<idno type="url">https://hal.inria.fr/hal-01261705</idno>
<date when="2004">2004</date>
<idno type="wicri:Area/Hal/Corpus">000144</idno>
<idno type="wicri:Area/Hal/Curation">000144</idno>
<idno type="wicri:Area/Hal/Checkpoint">000141</idno>
<idno type="wicri:doubleKey">1638-5713:2004:Sari T:correction:des:erreurs</idno>
<idno type="wicri:Area/Main/Merge">001733</idno>
<idno type="wicri:Area/Main/Curation">001430</idno>
<idno type="wicri:Area/Main/Exploration">001430</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr">Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe</title>
<author><name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID"><orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc><address><country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-5124" status="VALID"><orgName>Laboratoire de Recherche informatique - Badji Mokhtar University</orgName>
<orgName type="acronym">LRI </orgName>
<desc><address><country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.org</ref>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
</analytic>
<series><title level="j">Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées</title>
<idno type="ISSN">1638-5713</idno>
<imprint><date type="datePub">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="en"><term> arabic linguistic</term>
<term> error detection</term>
<term> post-processing.</term>
<term> probabilistic rule-based techniques</term>
<term> word correction</term>
<term>Arabic character recognition</term>
</keywords>
<keywords scheme="mix" xml:lang="fr"><term>OCR arabe</term>
<term>analyse morpho-lexicale</term>
<term>base de règles.</term>
<term>correction des mots</term>
<term>détection des erreurs</term>
<term>langue arabe</term>
<term>post-traitement</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, we present two methods for correcting Arabic words generated by text and/or speech recognizers. These techniques operate as post-processors and they are conceived to be adaptable. They correct rejection and substitution word errors. The former one is very linked to the dictionary and is called 'lexicon driven', when the orther is very general exploiting contextual information and called 'context driven'. Arabic language properties are very useful in morpho-lexical analysis and so they were strongly exploited in the development of the second method. Substitution errors are rewritten in rules for being used by a rule based system. The extensions to the other levels of language analysis are considered in perspectives.</div>
</front>
</TEI>
<affiliations><list><country><li>Algérie</li>
</country>
</list>
<tree><country name="Algérie"><noRegion><name sortKey="Sari, Toufik" sort="Sari, Toufik" uniqKey="Sari T" first="Toufik" last="Sari">Toufik Sari</name>
</noRegion>
<name sortKey="Sellami, Mokhtar" sort="Sellami, Mokhtar" uniqKey="Sellami M" first="Mokhtar" last="Sellami">Mokhtar Sellami</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001430 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001430 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Hal:hal-01261705 |texte= Correction des erreurs orthographiques des systèmes de reconnaissance de l'écriture et de la parole arabe }}
This area was generated with Dilib version V0.6.32. |